Good afternoon. I’m Vito of the Legitimate Business Syndicate, and I’ve been invited here by our wonderful hosts to talk about our time running the Capture the Flag contest at DEF CON. 


Capture the Flag, or “CTF,” is a broad category of computer hacking contests. They generally involve interacting a computer system of some kind, recovering a piece of data, also known as a flag, and redeeming that flag for points.


DEF CON is the world’s largest computer hacking conference. It’s held every summer in Las Vegas, the famous city of casinos in the middle of a desert in the US, and was attended by 25,000 people last year.


DEF CON has hosted a Capture the Flag event since 1996. Originally, it was an extremely flexible contest, with competitors either bringing a new service or attacking an existing one, which placed a heavy workload on the human judges. Since then, the contest at DEF CON has stabilized into two distinct games, following two distinct formats.


Jeopardy style contests are by far the most popular kind of CTF, because they’re easy to run well and easily support hundreds of competitors. Organizers do all the difficult and fiddly work of making the game run reliably, and players just solve challenges.


DEF CON qualifiers are Jeopardy-style contests.


Named after the US trivia TV game show, teams connect to the scoring system and are presented with prompts organized into categories. They pick an available prompt, and work on solving it. This may be something as quick as a trivia question, but it’s usually a hacking challenge.


Last summer, I played in the “SHA 2017 CTF,” run by the Dutch team Eindbazen. It was a Jeopardy-style game featuring a challenge called “asby.” The first step is to get the executable from the scoreboard, and determine that it’s a Windows console executable. After spending some time learning how to run Windows binaries on a Mac (I used a Docker image with Wine installed,) I gave it a run.


It asks for the flag on launch, and depending on what text you give it, it tells you how much of the flag you have right.


From here, I saw three ways to solve it: do reverse engineering for binaries on a platform I can barely run them on, solve it by hand character by character, or write a script to solve it. I opted for the third option, mostly, because like any program, solve scripts can have bugs.


At the end, I had a solution, got some points, snagged some free drinks from the organizers, and started on the next challenge.


Attack-defense contests give each team a collection of network services that they have to reverse engineer, patch flaws, attack opponents’ flaws, and continue to pass availability checks. The complex set of tasks that teams do make it a much more difficult event to run, but they provide a unique challenge for competitors.


DEF CON finals are traditionally an attack-defense game. 


Attack-defense games are usually broken into “rounds.” In each round, a “poller” checks to see if a team’s instance of a service is functioning correctly. 


One of my favorite attack-defense services I’ve been party to was “Rubix,” in our DEF CON CTF 2017 finals. It was the first service we made available to teams on the first day of the event, running on a computer architecture teams had only had access to for 24 hours.


Rubix is a “Rubik’s Cube” puzzle simulator, requiring the user to produce fifty-four instructions to rotate a scrambled cube into a solved configuration. Once there, it evaluates the instructions as shellcode.


A member of the team Lab RATs posted a write-up, so we can follow along with how they solved it.


The work starts by figuring out how to interact with the challenge. Since it’s an architecture with 9-bit bytes, you need a way to talk to it that can transcode to the 8-bit bytes most of us expect from our computers. They made a version of netcat called “CLEMnc”.


Once they were more used to 9-bit bytes, they analyzed the binary, figuring out which part was the C standard library (or “libc”,) and which libc we used. With that information, they can figure out what functions in libc we call.


Once all that’s done, they can start analyzing the challenge.


As teams analyzed Rubix, they were able to construct attacks against other teams to steal the flag out of memory. However, they also had to patch their own instance to defend from other teams’ attacks, which were intentionally very hard to distinguish from benign traffic.


How do teams actually score points? In our game, stealing and redeeming the flag from the service awarded points, while having your flag stolen or failing poller checks would lose points. 






Hosting either kind of contest is an incredibly ambitious proposition: by necessity, we need to use computers to run a competition for more than a couple teams, but computer hacking is literally about disrupting, subverting, or otherwise breaking computer systems. 


If it doesn’t run smoothly, players will find it frustrating. If it’s not run fairly, players won’t put in a legit effort. If it doesn’t have fun and difficult challenges, players won’t enjoy it at all.


SMOOTH OPERATIONS
FAIR CONTEST
FUN CHALLENGES
Smooth Operations
Building and running any kind of project successfully starts long before any work has been done: it starts with the team building it.


Most of us that would eventually form Legitimate Business Syndicate either knew each other from university about thirteen years ago, or worked together since then. 


Gyno did the initial work in coming up with a team name, talking to people he wanted involved, and arranging for us to meet and put together a proposal, during several months from December 2012 to February 2013. 


Our team ended up having an extremely diverse set of skills. 


I could easily lump about three quarters of the group together as “reverse engineers,” but they specialized in more than just that: we relied heavily on our hardware and radio enthusiasts for some challenges, and we completely depended on our esoteric computing expert for our 2017 game. 


A CTF needs to run on networked computing infrastructure, and our network and computing infrastructure expert is vital. 


Our CTF game was basically a database-backed web application, which matches my background.


Communication is extremely important too. Real-time chat is great for stuff you can talk about asynchronously, weekly meetings are useful, but for important events, we worked to get together in real life. For qualifiers, we met either at an office, someone’s house, or last year, a big rental house. The run-up to our finals game in Las Vegas meant picking a hotel room to set up servers and snacks in to use as a communal hack space.


A team isn’t just skills and techniques. Everyone on the team depends on each other! You have to get along, and if you want to get along successfully, make sure everyone’s well-fed, has plenty to drink, and adequate downtime.


One year, on the Saturday afternoon of DEF CON, we were having persistent problems with a team’s computer. Selir, our infrastructure expert, was running between his computer and the server rack trying to troubleshoot this particular flaw. I was up on stage as well, because the failure was delaying the normal progress of the game, which was my responsibility. At one point, I asked when it would be fixed, and, sparing me from the built up frustration and pressure from multiple sources, Selir took a walk.


He did come back, some time later, having eaten lunch and figured out a path forward from the problem, which is great, because Selir basically ran the competition. What I remember now is that caring for your teammates is the best thing you can do for to care for competitors.


One last thing I should mention is that your CTF software is like any other software in that you can and should have automated tests and automated deployment.
Fair Contest
CTF services are unique among programs. Their purpose is to be broken down, analyzed, and their behavior manipulated externally. This adds a substantial risk, because if the service is being attacked, it might be affected in a way to damage its availability for other players, or to damage our hosting infrastructure. 


Our most effective way of protecting the game from its challenges was to heavily restrict what targeted services can do. In qualifiers, we ran services on separate hosts, in separate Linux containers per connection, with strict restrictions on what system calls they could make. 


For finals, it becomes a lot more complicated, because not only do services get attacked, but teams are required to patch those services to make them more difficult to attack. In order to make sure the game stayed about reverse engineering, we started down a path to restrict what patched services could do, and then we were given a solution.


Starting in our 2013 finals game, we limited teams’ access to their servers, giving them an unprivileged login account, and running services in unprivileged accounts. It mostly worked, but we learned about a technique we called a “Superman defense,” or a defense that simply can’t be attacked. For example, wrapping a vulnerable service in an emulator that restricts its ability to read the flag off disk is a Superman defense.


With that knowledge, we tried to have both technical measures and rules about Superman defenses in 2014, but 


In 2014, the US Defense Advanced Research Projects Agency, or DARPA, started planning an attack-defense game called the Cyber Grand Challenge. The Cyber Grand Challenge, or “CGC,” was a CTF designed to be played entirely by autonomous computers.


In order for this to work, CGC had to be very formal about many aspects of CTF. Executables in CGC were all 32-bit x86 binaries, but using a special executable format that had extra provisions for automated analysis such as a severely limited number of system calls, and fixed locations in memory for parts of the executable. 


Instead of teams running their own instances, the scoring system orchestrated binaries on its own computing resources. Players downloaded vulnerable services from the scoring system, and uploaded patched replacements to the scoring system. Instead of launching an exploit, players would upload a “proof of vulnerability” executable that would run against opponents on the scoring system’s schedule.


We ended up using a lot of their research for our game. Starting in 2015, we limited allowable system calls for services. 


In 2016, the day after the CGC contest, we ran with a reimplementation of their contest, including putting the computer that won CGC against fifteen teams of humans. 


Finally, in 2017, we ran the game on an emulated platform with limited access to the actual computer behind it. Because teams had no access to information about the architecture until a day before the game, tooling development started on equal footing. Since the services ran in an emulator without a bunch of hooks into the messy real world, there were limited things teams could do to subvert the contest, making it more fair.


One thing we did year-to-year was release as much information about the game as reasonable. For qualifiers, this meant scores were made public, and for finals, we dump the whole scoring database and collection of packet captures online for public analysis. Over the years, we received a lot of valuable public feedback about our scoring systems that improved future games.


Fairness beyond challenges means keeping score fairly, but it can also mean accessibility. We learned to make the language in-game and on our website less dependent on US online culture, because not every player is immersed in it, and that’s not what our contest was supposed to be about. We learned to avoid timing sensitive challenges in qualifiers, because not every team had a low latency connection to where we hosted the game.


There’s a fine line between extra work that’s external to the competition, and extra work that’s under evaluation. When we first introduced our game in 2013, we were the first DEF CON CTF organizers to run Linux challenges on ARM computers. This meant that some teams had the wrong tooling, some teams had hedged their bets with ARM tools, and other teams were able to adapt and overcome.
Fun Challenges
Breaking expectations leads to fun and memorable challenges. My favorite example of this is Lightning’s “dosfun4u” challenges from our 2014 qualifiers. The first two steps in one write-up for it are “discover that it’s a DOS binary” (that DOS, from the ‘80s and early ‘90s), and “debug the IDA Pro disassembly tool.”


Another challenge that got good reviews was “badger,” an MSP-430 service running on custom badge hardware with an image decompression vulnerability. This was possibly the most complex service we’ve ever deployed. It required board design, hours of soldering, and running a radio network.


Our 2016 and 2017 finals games were built around “consensus evaluation,” an attack-defense variant developed by DARPA for CGC. In consensus evaluation games, teams don’t just see the original binary, but every version their opponents worked on. Since this means an explosion in the number of binaries teams would have to analyze, I decided to write challenges for the qualifiers that required automated analysis. 


How do you force automated analysis on teams? Give them hundreds of binaries to solve! We got a pile of solutions, using different tools, and a few write-ups that talk about how they’d never done that kind of challenge before, which counts as a success to me.


Consensus evaluation brings a lot of excitement to attack-defense games. 


Our 2016 game was most teams’ first exposure to consensus evaluation. On the Saturday of the game, a player came to our table to ask about why their team was losing points. We checked the scoring system internals, and told them it was because they were “completely owned,” i.e. their service was vulnerable and being attacked successfully. They claimed that wasn’t possible, because they were using the same binaries as the leading team, who were not being attacked. 


Then they thought about it for a couple seconds, facepalmed, and walked back to their table. You can use other teams’ binaries in a consensus evaluation game, but they may have left a back door in them.


Our finals game in 2017 had an extra bonus. Since teams could reverse engineer others’ updated or patched services, they could attack the patched versions. More than one player came to us after the game was over to comment how they felt like they had a back-and-forth with a competitor, taking turns patching and building attacks, and it finally felt like a multiplayer game against other humans.


While I wish we discovered that five years ago, I’m very happy we discovered that at all.
Conclusions
Looking back, what did we learn?


Computer hacking Capture The Flag is still very young. There are countless improvements to the game that just need to be discovered, and lots of opportunity to make it more accessible to a larger community.


The two best ways to get experience with CTF are playing them and running them.